AITopics | answer selection

Collaborating Authors

answer selection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems

Agrawal, Aakriti, Aralikatti, Rohith, Satheesh, Anirudh, Chakraborty, Souradip, Bedi, Amrit Singh, Huang, Furong

arXiv.org Artificial IntelligenceOct-6-2025

Large Language Models (LLMs) have demonstrated exceptional capabilities, yet selecting the most reliable response from multiple LLMs remains a challenge, particularly in resource-constrained settings. Existing approaches often depend on costly external verifiers, human evaluators, or self-consistency techniques that require multiple samples from a single model. While multi-LLM systems produce more diverse responses than single models and thus have greater potential, they often underperform compared to single LLM self-consistency. We propose a principled, novel and computationally efficient method to select the best response from multiple different LLMs using a calibrated log-likelihood score, implicitly leveraging the inherent knowledge and confidence of these models. Our method demonstrates improvements of approx. 4%, 3%, and 5% across both debate (multi-round LLM discussions) and non-debate (Best-of-N with multiple LLMs) settings on GSM8K, MMLU (6 subsets), and ARC datasets respectively.

arxiv preprint arxiv, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2510.02377

Country: North America > United States > Maryland (0.04)

Genre: Research Report (0.82)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning

Han, Jiuzhou, Buntine, Wray, Shareghi, Ehsan

arXiv.org Artificial IntelligenceAug-5-2025

Large language models have demonstrated remarkable capabilities in complex mathematical reasoning tasks, but they inevitably generate errors throughout multi-step solutions. Process-level Reward Models (PRMs) have shown great promise by providing supervision and evaluation at each intermediate step, thereby effectively improving the models' reasoning abilities. However, training effective PRMs requires high-quality process reward data, yet existing methods for constructing such data are often labour-intensive or inefficient. In this paper, we propose an uncertainty-driven framework for automated process reward data construction, encompassing both data generation and annotation processes for PRMs. Additionally, we identify the limitations of both majority vote and PRMs, and introduce two generic uncertainty-aware output aggregation methods: Hybrid Majority Reward Vote and Weighted Reward Frequency Vote, which combine the strengths of majority vote with PRMs. Extensive experiments on ProcessBench, MATH, and GSMPlus show the effectiveness and efficiency of the proposed PRM data construction framework, and demonstrate that the two output aggregation methods further improve the mathematical reasoning abilities across diverse PRMs. The code and data will be publicly available at https://github.com/Jiuzhouh/UnPRM.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.01773

Country:

Europe > Austria > Vienna (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4

Yao, Jiarui, Wang, Ruida, Zhang, Tong

arXiv.org Artificial IntelligenceMar-5-2025

Large Language Models (LLMs) have displayed astonishing abilities in various tasks, especially in text generation, classification, question answering, etc. However, the reasoning ability of LLMs still faces many debates. The inherent ambiguity of Natural Language (NL) limits LLMs' ability to perform verifiable reasoning, making its answers lack coherence and trustworthy support. To tackle the above problems, we propose a novel framework named FANS: Formal ANswer Selection for Natural Language Math Reasoning Using Lean4. To the best of our knowledge, it is the first framework that utilizes Lean4 to enhance LLMs' NL math reasoning ability. In particular, given an NL math question and LLM-generated answers, FANS first translates it into Lean4 theorem statements. Then it tries to prove it using a Lean4 prover and verify it by Lean4. Finally, it uses the FL result to assist in answer selection. It enhances LLMs' NL math ability in providing a computer-verifiable solution for its correct answer and proposes an alternative method for answer selection beyond the reward model. Extensive experiments indicate the effectiveness of our framework. It can improve the accuracy rate of reward model enhanced LLMs in the MATH-500 dataset by at most 1.91% and AMC-23 by at most 8.33% on strong reward-model baselines. In some particular fields like number theory that Lean4 experts in, we can even select all correct solutions. The qualitative analysis also shows our framework can make NL results formally backed by Lean4 proofs. As a pioneering work in the corresponding field, we will open-source all our models and datasets to further boost the development of the field.

arxiv preprint arxiv, language statement, reasoning, (14 more...)

arXiv.org Artificial Intelligence

2503.03238

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pre-training, Fine-tuning and Re-ranking: A Three-Stage Framework for Legal Question Answering

Ni, Shiwen, Cheng, Hao, Yang, Min

arXiv.org Artificial IntelligenceDec-27-2024

Legal question answering (QA) has attracted increasing attention from people seeking legal advice, which aims to retrieve the most applicable answers from a large-scale database of question-answer pairs. Previous methods mainly use a dual-encoder architecture to learn dense representations of both questions and answers. However, these methods could suffer from lacking domain knowledge and sufficient labeled training data. In this paper, we propose a three-stage (\underline{p}re-training, \underline{f}ine-tuning and \underline{r}e-ranking) framework for \underline{l}egal \underline{QA} (called PFR-LQA), which promotes the fine-grained text representation learning and boosts the performance of dense retrieval with the dual-encoder architecture. Concretely, we first conduct domain-specific pre-training on legal questions and answers through a self-supervised training objective, allowing the pre-trained model to be adapted to the legal domain. Then, we perform task-specific fine-tuning of the dual-encoder on legal question-answer pairs by using the supervised learning objective, leading to a high-quality dual-encoder for the specific downstream QA task. Finally, we employ a contextual re-ranking objective to further refine the output representations of questions produced by the document encoder, which uses contextual similarity to increase the discrepancy between the anchor and hard negative samples for better question re-ranking. We conduct extensive experiments on a manually annotated legal QA dataset. Experimental results show that our PFR-LQA method achieves better performance than the strong competitors for legal question answering.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2412.19482

Country: Asia > China > Guangdong Province > Shenzhen (0.05)

Genre: Research Report > New Finding (0.34)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.95)

Add feedback

A Differentiable Integer Linear Programming Solver for Explanation-Based Natural Language Inference

Thayaparan, Mokanarangan, Valentino, Marco, Freitas, André

arXiv.org Artificial IntelligenceApr-3-2024

Integer Linear Programming (ILP) has been proposed as a formalism for encoding precise structural and semantic constraints for Natural Language Inference (NLI). However, traditional ILP frameworks are non-differentiable, posing critical challenges for the integration of continuous language representations based on deep learning. In this paper, we introduce a novel approach, named Diff-Comb Explainer, a neuro-symbolic architecture for explanation-based NLI based on Differentiable BlackBox Combinatorial Solvers (DBCS). Differently from existing neuro-symbolic solvers, Diff-Comb Explainer does not necessitate a continuous relaxation of the semantic constraints, enabling a direct, more precise, and efficient incorporation of neural representations into the ILP formulation. Our experiments demonstrate that Diff-Comb Explainer achieves superior performance when compared to conventional ILP solvers, neuro-symbolic black-box solvers, and Transformer-based encoders. Moreover, a deeper analysis reveals that Diff-Comb Explainer can significantly improve the precision, consistency, and faithfulness of the constructed explanations, opening new opportunities for research on neuro-symbolic architectures for explainable and transparent NLI in complex domains.

constraint, diff-comb explainer, explanation, (13 more...)

arXiv.org Artificial Intelligence

2404.02625

Country:

Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Europe > Switzerland (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China (0.04)

Genre: Research Report (0.84)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.69)

Add feedback

Enhancing Answer Selection in Community Question Answering with Pre-trained and Large Language Models

Hu, Xinghang

arXiv.org Artificial IntelligenceNov-29-2023

Community Question Answering (CQA) becomes increasingly prevalent in recent years. However, there are a large number of answers, which is difficult for users to select the relevant answers. Therefore, answer selection is a very significant subtask of CQA. In this paper, we first propose the Question-Answer cross attention networks (QAN) with pre-trained models for answer selection and utilize large language model (LLM) to perform answer selection with knowledge augmentation. Specifically, we apply the BERT model as the encoder layer to do pre-training for question subjects, question bodies and answers, respectively, then the cross attention mechanism selects the most relevant answer for different questions. Experiments show that the QAN model achieves state-of-the-art performance on two datasets, SemEval2015 and SemEval2017. Moreover, we use the LLM to generate external knowledge from questions and correct answers to achieve knowledge augmentation for the answer selection task by LLM, while optimizing the prompt of LLM in different aspects. The results show that the introduction of external knowledge can improve the correct answer selection rate of LLM on datasets SemEval2015 and SemEval2017. Meanwhile, LLM can also select the correct answer on more questions by optimized prompt.

answer selection, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2311.17502

Country:

Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
Oceania > Australia (0.04)
(7 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues

Deng, Wentao, Pei, Jiahuan, Ren, Zhaochun, Chen, Zhumin, Ren, Pengjie

arXiv.org Artificial IntelligenceJul-13-2023

Answer selection in open-domain dialogues aims to select an accurate answer from candidates. Recent success of answer selection models hinges on training with large amounts of labeled data. However, collecting large-scale labeled data is labor-intensive and time-consuming. In this paper, we introduce the predicted intent labels to calibrate answer labels in a self-training paradigm. Specifically, we propose the intent-calibrated self-training (ICAST) to improve the quality of pseudo answer labels through the intent-calibrated answer selection paradigm, in which we employ pseudo intent labels to help improve pseudo answer labels. We carry out extensive experiments on two benchmark datasets with open-domain dialogues. The experimental results show that ICAST outperforms baselines consistently with 1%, 5% and 10% labeled data. Specifically, it improves 2.06% and 1.00% of F1 score on the two datasets, compared with the strongest baseline with only 5% labeled data.

artificial intelligence, machine learning, selection, (17 more...)

arXiv.org Artificial Intelligence

2307.06703

Country:

North America > United States > Massachusetts (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Nepal (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Question-Answer Sentence Graph for Joint Modeling Answer Selection

Iyer, Roshni G., Vu, Thuy, Moschitti, Alessandro, Sun, Yizhou

arXiv.org Artificial IntelligenceApr-23-2023

This research studies graph-based approaches for Answer Sentence Selection (AS2), an essential component for retrieval-based Question Answering (QA) systems. During offline learning, our model constructs a small-scale relevant training graph per question in an unsupervised manner, and integrates with Graph Neural Networks. Graph nodes are question sentence to answer sentence pairs. We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs, and use thresholding on relevance scores for creating graph edges. Online inference is then performed to solve the AS2 task on unseen queries. Experiments on two well-known academic benchmarks and a real-world dataset show that our approach consistently outperforms SOTA QA baseline models.

machine learning, natural language, node, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.48550/arXiv.2203.03549

2203.03549

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > Scotland (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.47)
Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Best-Answer Prediction in Q&A Sites Using User Information

Hadfi, Rafik, Moustafa, Ahmed, Yoshino, Kai, Ito, Takayuki

arXiv.org Artificial IntelligenceDec-14-2022

Community Question Answering (CQA) sites have spread and multiplied significantly in recent years. Sites like Reddit, Quora, and Stack Exchange are becoming popular amongst people interested in finding answers to diverse questions. One practical way of finding such answers is automatically predicting the best candidate given existing answers and comments. Many studies were conducted on answer prediction in CQA but with limited focus on using the background information of the questionnaires. We address this limitation using a novel method for predicting the best answers using the questioner's background information and other features, such as the textual content or the relationships with other participants. Our answer classification model was trained using the Stack Exchange dataset and validated using the Area Under the Curve (AUC) metric. The experimental results show that the proposed method complements previous methods by pointing out the importance of the relationships between users, particularly throughout the level of involvement in different communities on Stack Exchange. Furthermore, we point out that there is little overlap between user-relation information and the information represented by the shallow text features and the meta-features, such as time differences.

machine learning, natural language, questioner, (21 more...)

arXiv.org Artificial Intelligence

2212.08475

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Texas (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.41)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Going Beyond Approximation: Encoding Constraints for Explainable Multi-hop Inference via Differentiable Combinatorial Solvers

Thayaparan, Mokanarangan, Valentino, Marco, Freitas, André

arXiv.org Artificial IntelligenceAug-5-2022

Integer Linear Programming (ILP) provides a viable mechanism to encode explicit and controllable assumptions about explainable multi-hop inference with natural language. However, an ILP formulation is non-differentiable and cannot be integrated into broader deep learning architectures. Recently, Thayaparan et al. (2021a) proposed a novel methodology to integrate ILP with Transformers to achieve end-to-end differentiability for complex multi-hop inference. While this hybrid framework has been demonstrated to deliver better answer and explanation selection than transformer-based and existing ILP solvers, the neuro-symbolic integration still relies on a convex relaxation of the ILP formulation, which can produce sub-optimal solutions. To improve these limitations, we propose Diff-Comb Explainer, a novel neuro-symbolic architecture based on Differentiable BlackBox Combinatorial solvers (DBCS) (Pogan\v{c}i\'c et al., 2019). Unlike existing differentiable solvers, the presented model does not require the transformation and relaxation of the explicit semantic constraints, allowing for direct and more efficient integration of ILP formulations. Diff-Comb Explainer demonstrates improved accuracy and explainability over non-differentiable solvers, Transformers and existing differentiable constraint-based multi-hop inference frameworks.

explanation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2208.03339

Country:

Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Europe > Switzerland (0.04)
Europe > Italy (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback